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Crises in financial markets affect humans worldwide. Detailed market data on trading decisions reflect some 
of the complex human behavior that has led to these crises. We suggest that massive new data sources 
resulting from human interaction with the Internet may offer a new perspective on the behavior of market 
participants in periods of large market movements. By analyzing changes in Google query volumes for 
search terms related to finance, we find patterns that may be interpreted as "early warning signs" of stock 
market moves. Our results illustrate the potential that combining extensive behavioral data sets offers for a 
better understanding of collective human behavior. 

The increasing volumes of 'big data' reflecting various aspects of our everyday activities represent a vital new 
opportunity for scientists to address fundamental questions about the complex world we inhabit 17 . 
Financial markets are a prime target for such quantitative investigations 8,9 . Movements in the markets exert 
immense impacts on personal fortunes and geopolitical events, generating considerable scientific attention to this 
subject' 019 . For example, a range of recent studies have focused on modeling financial markets 20 25 and on 
performing network analyses 26 " 29 . 

At their core, financial trading data sets reflect the myriad of decisions taken by market participants. According 
to Herbert Simon, actors begin their decision making processes by attempting to gather information 30 . In today's 
world, information gathering often consists of searching online sources. Recently, the search engine Google has 
begun to provide access to aggregated information on the volume of queries for different search terms and how 
these volumes change over time, via the publicly available service Google Trends. In the present study, we 
investigate the intriguing possibility of analyzing search query data from Google Trends to provide new insights 
into the information gathering process that precedes the trading decisions recorded in the stock market data. 

A recent investigation has shown that the number of clicks on search results stemming from a given country 
correlates with the amount of investment in that country 31 . Further studies exploiting the temporal dimension of 
Google Trends data have demonstrated that changes in query volumes for selected search terms mirror changes in 
current numbers of influenza cases 32 and current volumes of stock market transactions 33 . This demonstration of a 
link between stock market transaction volume and search volume has also been replicated using Yahoo! data 34 . 
Choi and Varian 35 have shown that data from Google Trends can be linked to current values of various economic 
indicators, including automobile sales, unemployment claims, travel destination planning and consumer con- 
fidence. A very recent study has shown that Internet users from countries with a higher per capita GDP are more 
likely to search for information about years in the future than years in the past 36 . 

Here, we suggest that within the time period we investigate, Google Trends data did not only reflect the current 
state of the stock markets 33 but may have also been able to anticipate certain future trends. Our findings are 
consistent with the intriguing proposal that notable drops in the financial market are preceded by periods of 
investor concern. In such periods, investors may search for more information about the market, before eventually 
deciding to buy or sell. Our results suggest that, following this logic, during the period 2004 to 201 1 Google Trends 
search query volumes for certain terms could have been used in the construction of profitable trading strategies. 

Results 

We analyze the performance of a set of 98 search terms. We included terms related to the concept of stock 
markets, with some terms suggested by the Google Sets service, a tool which identifies semantically related 
keywords. The set of terms used was therefore not arbitrarily chosen, as we intentionally introduced some 
financial bias. We explain our strategy based on changes in search volume with reference to the term debt, a 
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Figure 1 | Search volume data and stock market moves. Time series of closing prices p(t) of the Dow Jones Industrial Average (DJIA) on the first day of 
trading in each week t covering the period from 5 January 2004 until 22 February 2011. The color code corresponds to the relative search volume changes 
for the search term debt, with At = 3 weeks. Search volume data are restricted to requests of users localized in the United States of America. 



keyword with an obvious semantic connection to the most recent 
financial crisis, and overall the term which performed best in our 
analyses. 

To uncover the relationship between the volume of search queries 
for a specific term and the overall direction of trader decisions, we 
analyze closing prices p(t) of the Dow Jones Industrial Average (DJIA) 
on the first trading day of week t. We use Google Trends to determine 
how many searches n(t-l) have been carried out for a specific search 
term such as debt in week t - 1, where Google defines weeks as ending 
on a Sunday, relative to the total number of searches carried out on 
Google during that time. We find that search volume data change 
slightly over time due to Google's extraction procedure. For each 
search term, we therefore average over three realizations of its search 
volume time series, based on three independent data requests in 
consecutive weeks. The variability of Google Trends data across dif- 
ferent dates of access is irrelevant for our results, and it can be shown 
that the data are consistent with reported real world events (see 
Fig. SI in the Supplementary Information). 

To quantify changes in information gathering behavior, we use the 
relative change in search volume: An(t, At) = n(t) — N(t — 1, At) 
withlVYf- I, At) = (n(t- I) + n(t- 2) + ... + n(t- At))/ At, where 
t is measured in units of weeks. In Fig. 1, we depict relative search 
volume changes for the term debt, and their relationship to DJIA 
closing prices. 

To investigate whether changes in information gathering behavior 
as captured by Google Trends data were related to later changes in 
stock price in the period between 2004-201 1, we implement a hypo- 
thetical investment strategy for a portfolio using search volume data, 
called 'Google Trends strategy' in the following. Profit can only be 
made in a trading strategy if at least some future changes in the stock 
price are correctly anticipated, in particular around large market 



movements. We implement this strategy by selling the DJIA at the 
closing price p(t) on the first trading day of week t, if An(t — 1, At) > 
0, and buying the DJIA at price p(t + 1) at the end of the first trading 
day of the following week. Note that mechanisms exist which make it 
possible to sell assets in financial markets without first owning them. 
If instead An(t — 1, A t) < 0, then we buy the DJIA at the closing price 
p(t) on the first trading day of week t and sell the DJIA at price 
p(t + 1) at the end of the first trading day of the coming week. At 
the beginning of trading, we set the value of all portfolios to an 
arbitrary value of 1 . If we take a 'short position' — selling at the closing 
price p(t) and buying back at price p(t + 1 ) — then the cumulative 
return R changes by \og(p(t)) — log(p(f + 1)). If we take a 'long 
position' — buying at the closing price p(t) and selling at price 
p(t + I) — then the cumulative return R changes by \og(p(t + lj) 
— \og(p(t)). In this way, buy and sell actions have symmetric impacts 
on the cumulative return R of a strategy's portfolio. In using this 
approach to analyze the relationship between Google search volume 
and stock market movements, we neglect transaction fees, since the 
maximum number of transactions per year when using our strategy 
is only 104, allowing a closing and an opening transaction per week. 
We of course do not dispute that such transaction fees would impact 
profit in a real world implementation. 

In Fig. 2, the performance of the Google Trends strategy based on 
the search term debt is depicted by a blue line, whereas dashed lines 
indicate the standard deviation of the cumulative return from a 
strategy in which we buy and sell the market index in an uncorre- 
lated, random manner ('random investment strategy'). The standard 
deviation is derived from simulations of 10,000 independent realiza- 
tions of the random investment strategy. Fig. 2 shows that the use of 
the Google Trends strategy, based on the search term debt and At = 3 
weeks, would have increased the value of a portfolio by 326%. The 




Time, t [Years] 

Figure 2 | Cumulative performance of an investment strategy based on Google Trends data. Profit and loss for an investment strategy based on the 
volume of the search term debt, the best performing keyword in our analysis, with A t = 3 weeks, plotted as a function of time ( blue line) . This is compared 
to the "buy and hold" strategy (red line) and the standard deviation of 10,000 simulations using a purely random investment strategy (dashed lines). The 
Google Trends strategy using the search volume of the term defer would have yielded a profit of 326%. 
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Figure 3 | Performances of investment strategies based on search volume data. (A) Cumulative returns of 98 investment strategies based on search volumes 
restricted to search requests of users located in the United States for different search terms, displayed for the entire time period of our study from 5 January 
2004 until 22 February 201 1 — the time period for which Google Trends provides data. We use two shades of blue for positive returns and two shades of red for 
negative returns to improve the readability of the search terms. The cumulative performance for the "buy and hold strategy" is also shown, as is a "Dow Jones 
strategy", which uses weekly closing prices of the Dow Jones Industrial Average (DJIA) rather than Google Trends data (see gray bars) . Figures provided next to 
the bars indicate the returns of a strategy, R, in standard deviations from the mean return of uncorrelated random investment strategies, <R> RandomStrategy — 
0. Dashed lines correspond to —3, —2, —1, 0, +1, +2, and +3 standard deviations of random strategies. We find that returns from the Google Trends 
strategies tested are significantly higher overall than returns from the random strategies (<R>us — 0-60; t — 8.65, df — 97, p < 0.001, one sample t-test). 
(B) A parallel analysis shows that extending the range of the search volume analysis to global users reduces the overall return achieved by Google Trends 
trading strategies on the U.S. market (<R>us — 0.60, <R>Ghbai — 0A3; t = 2.69, df= 97, p < 0.01, two-sided paired t-test). However, returns are still 
significantly higher than the mean return of random investment strategies (<R> Global ~ 0A3; t — 6.40, df — 97, p < 0.001, one sample t-test). 
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Figure 4 | Analysis using strategies in which we take long or short positions only, using U.S. search volume data. (A) We implement Google Trends 
strategies in which we take long positions following a decrease in search volume, and never take short positions. We find that returns from these long 



position Google Trends strategies are significantly higher overall than returns from the random investment strategies (<R> 



USLong 



= 0.41;f= 11.42, df= 



97, p < 0.001, one sample t-test). Again, we find a positive correlation between our indicator of financial relevance and returns from these strategies 
(Kendall's tau = 0.242, z = 3.53, N = 98, p < 0.001). (B) We also implement Google Trends strategies in which we take short positions following an 
increase in search volume, and never take long positions. In line with our results from the long position Google Trends strategies, we find that returns from 
the short position Google Trends strategies are significantly higher overall than returns from the random investment strategies ( <R> usshon = 0. 19; t — 
5.28, df= 97, p < 0.001, one sample t-test), and that there is a positive correlation between our indicator of financial relevance and short position Google 
Trends returns (Kendall's tau = 0.275, z = 4.01, N = 98, p < 0.001). 
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performance of Google Trends strategies based on all other search 
terms that we analyze is depicted in Figures S3-S100 in the 
Supplementary Information. 

We rank the full list of the 98 investigated search terms by their 
trading performance when using search data for U.S. users only 
(Fig. 3A) and when using globally generated search volume 
(Fig. 3B). In order to ensure the robustness of our results, the overall 
performance of a strategy based on a given search term is determined 
as the mean value over the six returns obtained for At= 1 ... 6 weeks. 
Returns of the strategies are calculated as the logarithm of relative 
portfolio changes, following the usual definition of returns. The dis- 
tribution of final portfolio values resulting from the random invest- 
ment strategies is close to log-normal. Cumulative returns from the 
random investment strategy, derived from the logarithm of these 
portfolio values, therefore follow a normal distribution, with a mean 
value of <R> Randomstrategy = 0. Here we report R, the cumulative 
returns of a strategy, in standard deviations of the cumulative returns 
of these uncorrelated random investment strategies. 

We find that returns from the Google Trends strategies we tested 
are significantly higher overall than returns from the random strat- 
egies (<R> US = 0.60; t = 8.65, df= 97, p < 0.001, one sample t-test). 

We compare the performance of these search terms with two 
benchmark strategies. The 'buy and hold' strategy is implemented 
by buying the index in the beginning and selling it at the end of the 
hold period. This strategy yields 16% profit, equal to the overall 
increase in value of the DJIA in the time period from January 2004 
until February 201 1. We further implement a 'Dow Jones strategy' by 
using changes in p(t) in place of changes in search volume data as the 
basis of buy and sell decisions. We find that this strategy also yields 
only 33% profit with At = 3 weeks, or when determined as the mean 
value over the six returns obtained for At = 1 ... 6 weeks, 0.45 
standard deviations of cumulative returns of uncorrelated random 
investment strategies (Figs. 3A and 3B; see also Fig. S101 in the 
Supplementary Information). 

Our results show that performance of the Google Trends strategy 
differs with the search term chosen. We investigate whether these 
differences in performance can be partially explained using an indi- 
cator of the extent to which different terms are of financial rel- 
evance — a concept we quantify by calculating the frequency of 
each search term in the online edition of the Financial Times from 
August 2004 to June 201 1, normalized by the number of Google hits 
for each search term (see Fig. S2 in the Supplementary Information). 
We find that the return associated with a given search term is corre- 
lated with this indicator of financial relevance (Kendall's tau = 0.275, 
z = 4.01, N = 98, p < 0.001) using Kendall's tau rank correlation 
coefficient 37 . 

It is widely recognized that investors prefer to trade on their do- 
mestic market, suggesting that search data for U.S. users only, as used 
in analyses so far, should better capture the information gathering 
behavior of U.S. stock market participants than data for Google users 
worldwide. Indeed, we find that strategies based on global search 
volume data are less successful than strategies based on U.S. search 
volume data in anticipating movements of the U.S. market (<R> US 
= 0.60, <R> G i oba i = 0.43; t = 2.69, df = 97, p < 0.01, two-sided 
paired t-test). 

Our empirical results so far are consistent with a two part hypo- 
thesis: namely that key increases in the price of the DJIA were pre- 
ceded by a decrease in search volume for certain financially related 
terms, and conversely, that key decreases in the price of the DJIA 
were preceded by an increase in search volume for certain financially 
related terms. However, our trading strategy can be decomposed into 
two strategy components: one in which a decrease in search volume 
prompts us to buy (or take a long position) and one in which an 
increase in search volume prompts us to sell (or take a short position). 

In order to verify that both strategy components play a significant 
role in our results, such that we have evidence for both parts of this 



hypothesis, we implement and test one strategy in which we take long 
positions following a decrease in search volume but never take short 
positions (Fig. 4A), and another strategy in which we take short 
positions following an increase in search volume but never take long 
positions (Fig. 4B). We find that returns from both Google Trends 
strategy components are significantly higher overall than returns 
from a random investment strategy (long position strategies: 
<R> 

usiong ~ 0.41; t — 11.42, df — 97, p < 0.001, one sample t-test; 
short position strategies: <R> usshort = 0.19; t = 5.28, df = 97, p < 
0.001, one sample t-test). 

Discussion 

In summary, our results are consistent with the suggestion that during 
the period we investigate, Google Trends data did not only reflect 
aspects of the current state of the economy, but may have also pro- 
vided some insight into future trends in the behavior of economic 
actors. Using historic data from the period between January 2004 and 
February 2011, we detect increases in Google search volumes for key- 
words relating to financial markets before stock market falls. Our 
results suggest that these warning signs in search volume data could 
have been exploited in the construction of profitable trading strategies. 

We offer one possible interpretation of our results within the con- 
text of Herbert Simon's model of decision making 28 . We suggest that 
Google Trends data and stock market data may reflect two subsequent 
stages in the decision making process of investors. Trends to sell on 
the financial market at lower prices may be preceded by periods of 
concern. During such periods of concern, people may tend to gather 
more information about the state of the market. It is conceivable that 
such behavior may have historically been reflected by increased 
Google Trends search volumes for terms of higher financial relevance. 

We find that strategies based on search volume data for U.S. users 
are more successful for the U.S. market than strategies using global 
search volume data. Given the assumption that the population of U.S. 
Internet users contains a higher proportion of traders on the U.S. 
markets than the worldwide population of Internet users contains, 
this finding is in line with the intriguing suggestion that these data- 
sets may provide insights into different stages of decision making 
within the same population. 

In this work, we provide a quantification of the relationship 
between changes in search volume and changes in stock market 
prices. Future work will be needed to provide a thorough explanation 
of the underlying psychological mechanisms which lead people to 
search for terms like debt before selling stocks at a lower price. It is 
clear that many opportunities also remain to extend our analyses to 
further financial data sets. 

The results of our investigation suggest that combining large beha- 
vioral data sets such as financial trading data with data on search query 
volumes may open up new insights into different stages of large-scale 
collective decision making. We conclude that these results further 
illustrate the exciting possibilities offered by new big data sets to ad- 
vance our understanding of complex collective behavior in our society. 

Methods 

How related are search terms to the topic of finance? We quantify financial 
relevance by calculating the frequency of each search term in the online edition of the 
Financial Times (http://www.ft.com) from August 2004 to June 201 1, normalized by 
the number of Google hits (http://www.google.com) for each search term. Details are 
given in the Supplementary Information. 

Data retrieval. We retrieved search volume data by accessing the Google Trends 
website (http://www.google.com/trends) on 10 April 2011, 17 April 2011, and 24 
April 201 1. The data on the number of hits for search terms in the online edition of the 
Financial Times was retrieved on 7 June 201 1. The numbers of Google hits for these 
terms were obtained on 8 June 2011. 
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